Missing Data

Dean Adams, Iowa State University

16 October, 2019

The Problem of Missing Data

Missing Data

Dealing with Missing Data: Delete Specimens

Dealing with Missing Data: Delete Specimens

Dealing with Missing Data: Delete Landmarks

Dealing with Missing Data: Delete Landmarks

Estimate Missing Data

See Gunz et al. (2009). J. Hum. Evol.

1: Exploiting Symmetry

1: Exploiting Symmetry

1: Exploiting Symmetry

Note: the ‘reflectMissingLandmarks’ function in StereoMorph may be used for symmetry-based estimation of missing landmarks

Exploiting Symmetry: Example

Exploiting Symmetry: Example

Exploiting Symmetry: Test Procedure

Exploiting Symmetry: Test Procedure

\(\small{D}_{Proc}= 0.009\). Pretty good!

Exploiting Symmetry

2: Mean Substitution

see Arbour and Brown. (2014). Methods. Ecol. Evol.

Mean Substitution: Example

Mean Substitution: Example 1

Mean Substitution: Example 1 (Cont.)

Test Procedure

\(\small{D}_{Proc_{Ref-Orig}} = 0.26\) \(\small{D}_{Proc_{Ref-Est}} = 0.23\)

\(\small{D}_{Proc_{Orig-Est}} = 0.13\) Not good at all!

Mean Substitution: Example 2

Mean Substitution: Example 2 (Cont.)

Test Procedure

\(\small{D}_{Proc_{Ref-Orig}} = 0.15\) \(\small{D}_{Proc_{Ref-Est}} = 0.13\)

\(\small{D}_{Proc_{Orig-Est}} = 0.08\) Not good at all!

Mean Substitution

Mean Substitution

3: TPS Interpolation

3: TPS Interpolation

Bookstein et al. (1999). New. Anat.; Gunz et al. (2009). J. Hum. Evol.

TPS Interpolation: Concept

Bookstein et al. (1999). New. Anat.; Gunz et al. (2009). J. Hum. Evol.

TPS Interpolation: Example 1

\(\small{D}_{Proc_{Ref-Orig}} = 0.26\) \(\small{D}_{Proc_{Ref-Est}} = 0.27\)

\(\small{D}_{Proc_{Orig-Est}} = 0.003\) MUCH Better!

TPS Interpolation: Example 2

\(\small{D}_{Proc_{Ref-Orig}} = 0.15\) \(\small{D}_{Proc_{Ref-Est}} = 0.15\)

\(\small{D}_{Proc_{Orig-Est}} = 0.011\) MUCH Better!

TPS Interpolation

Advantages - Exploits spatial relationships of anatomy within a specimen

Disadvantages - Less accurate if many landmarks in a region missing (common with fossils) - Does not leverage additional covariation information in sample

4: Regression Interpolation

Note: Regression scores of PLS typically used as \(\small{p>n}\)
See Gunz et al. (2009). J. Hum. Evol.; Arbour and Brown (2014). Methods. Ecol. Evol.

Regression Interpolation: Example 1

\(\small{D}_{Proc_{Ref-Orig}} = 0.26\) \(\small{D}_{Proc_{Ref-Est}} = 0.27\)

\(\small{D}_{Proc_{Orig-Est}} = 0.030\) Pretty good!

Regression Interpolation: Example 2

\(\small{D}_{Proc_{Ref-Orig}} = 0.15\) \(\small{D}_{Proc_{Ref-Est}} = 0.15\)

\(\small{D}_{Proc_{Orig-Est}} = 0.011\) Even Better!

Regression Interpolation

Advantages - Exploits spatial relationships of anatomy within a specimen - Leverages covariation between anatomical landmarks - Leverages covariation within a sample

Disadvantages - May be less accurate when small samples are examined

NOTE: Estimation may be further improved by considering within-sample variation (e.g., use specimens within a species)

Methods Comparisons

Flow of Computations

Missing Data: Conclusions